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Abstract 

We present an efficient low-rank updating algorithm for updating the trial wavefunctions used 
in Quantum Monte Carlo (QMC) simulations. The algorithm is based on low-rank updating of the 
Slater determinants. In particular, the computational complexity of the algorithm is 0{kN) during 
the k-ih. step compared with traditional algorithms that require 0{N'^) computations, where N is 
the system size. For single determinant trial wavefunctions the new algorithm is faster than the 
traditional 0{N'^) Sherman-Morrison algorithm for up to 0{N) updates. For multideterminant 
configuration-interaction type trial wavefunctions of M + 1 determinants, the new algorithm is sig- 
nificantly more efficient, saving both 0{MN'^) work and 0{MN'^) storage. The algorithm enables 
more accurate and significantly more efficient QMC calculations using configuration interaction 
type wavefunctions. 
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I. INTRODUCTION 



Quantum Monte Carlo (QMC) is an approach capable of yielding highly accurate re- 
sults in systems ranging from isolated molecules to the solid state The success of most 
common QMC methods, namely variational Monte Carlo (VMC) and diffusion Monte Carlo 
(DMC), depends crucially on the choice of trial wavefunction. Indeed, the trial wavefunction 
limits both the statistical efficiency and accuracy of the simulation. In QMC methods, the 
evaluation of the trial wavefunction becomes the most demanding part of the calculation es- 
pecially when sufficiently large systems are considered or accurate simulations are required. 
This aspect of QMC was recognized even in the earliest DMC calculations, e.g. Ref. j^. 
Consequently, the choice of trial wavefunction used in QMC calculations is motivated both 
by the accuracy and the speed of evaluation. 

The most common form of trial wavefunction is of the Slater- Jastrow type 

*(R) = D(R)e-'(^), (1) 

where, neglecting spin, -D(R) is a Slater determinant, J(R) is a Jastrow function, and 
R = {ri,r2, ...,r7v} is a vector of the position rj of each electron. In QMC, the simulation 
commonly proceeds by proposing a local change to the electronic system configuration R 
to R'. This local change in R i-^ R' is induced by the movement of one electron at a 
time from position to r^. The probability that the proposed local change is accepted 
is dependent on the transition probability, which depends on the ratio of \E'(R')/\I'(R), 
where R' is a new set of electron positions. This transition probability computation in 
turn requires the computation of the ratio of determinants -D(R') and -D(R) in the new 
and old configurations respectively. Although a complete re-computation of -D(R') can 
be made, an efficient algorithm that computes the necessary ratio D(T{.')/ D(Ii) without 
resorting to a complete independent calculation of each determinant can significantly increase 
the overall efficiency of QMC simulations. Indeed, this efficiency measure is essential to 
the success of Slater- Jastrow wavefunctions; for a single electron move, the conventional 
algorithms use Sherman-Morrison formula (special case of Sherman-Morrison- Woodbury 
formula [sl]) which reduces the cost of evaluating D(R')/D(R) to 0{N), with an 0{N'^) cost 
if the move is accepted, compared with 0{N^) for a naive evaluation of the determinant. 
Once the ratio of the determinants has been calculated, most quantities required in the 
Monte Carlo can be obtained through a simple multiplicative scaling|l|. Comparatively 



recently, "linear sca^ 



iai scaling 
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ing" approaches have been developed to reduce the cost of evaluating the 



determinants [4, 15|, 16|, 17|, ISj by exploiting spatial locality in the studied physical system. In this 
paper, we explore alternative and complementary approaches to speedup the computation 
of transition probabilities and determinant ratios in QMC calculations. 

The most accurate and commonly used QMC method is the DMC method performed 
in the fixed node approximation. This method exhibits a varational error in the energy 
depending on the quality of the nodal surfaces (zeroes) of the trail wavefunction. To im- 
prove the nodes as well as the variational quality of the trial function, it is now routine to 
utilize multiple determinant trial functions. These are commonly obtained from multiconfig- 
uration quantum chemistry approaches such as the configuration interaction method where 
the ground state determinant D is supplemented by single and double excitations from the 
ground state. That is. 



a,c 



a,b,c,d 



(2) 



where D'^f denotes a double excitation with orbitals a and b replaced by c and d respectively, 
and a and (3 denote the multi-determinant expansion coefficients. Higher order excitations 
may be progressively included. Such an expansion of the wavefunction allows the nodal 
surface to be improved. 

There are many strong motivations for minimizing the computational cost of multide- 
terminant wavefunctions in QMC: Recent benchmark tests of the accuracy achievable in 
all electron VMC utilized, for example, up to 499 determinants to obtain over 90% of the 
correlation energy in the first row atoms jsj. To obtain a similar fraction of correlation 
energy in larger systems, more determinants are likely required. Numerous recent stud- 



ies |ld . I 111 . Il2| have shown the utility of increased numbers of determinants for improved 



accuracy in atomic, molecular, and solid-state applications. In general this result is ex- 
pected since quantum chemical techniques systematically improve the wavefunction with 
increased numbers of determinants. Improved trial wavefunctions using multideterminants 
are required for large systems such as the Cqq fuUerene where current trial wavefunctions are 
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tiple determinants may also 



insufficient for computing accurate optical properties ISj. Mu 

be required to represent certain spin symmetries, e.g. Ref. jl4]. Additionally, we have also 
recently shown that it is possible to sample the ground state wavefunction into a configu- 
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ration expansion|l^ and subsequently improve the trial wavefunction 16]. This application 
requires the use of large configuration interaction expansions consisting of potentially thou- 
sands of determinants. 

In this paper we propose an efficient algorithm for utilizing Slater- Jastrow trial wave- 
functions in QMC simulations. The algorithm is particularly efficient for multideterminant 
wavef unctions. Extension to related alternative wavefunction forms such as multi-pfaffian 
and multi-backflow wavefunctions is straightforward. In Section [TTl we present the details 
of the algorithm. Section IIIII presents benchmark timing and efficiency measures for sin- 
gle determinant calculations using a variety of system sizes. The multideterminant case is 
analysed in Section [IVl Conclusions are given in Section El 



II. ALGORITHMS FOR UPDATING SLATER DETERMINANTS 



As mentioned earlier, in QMC, the Monte Carlo simulation proceeds by proposing a local 
change to the electronic system configuration R to R'. The acceptance criterion for each such 
local change follows the traditional Metropolis algorithm, which requires the computation 
of the transition probability. Each time a local change is accepted, the Slater matrix D(R) 
is updated to D(R') by modifying one of the rows of D(R) corresponding to an electron 
movement from r to r'. The simulation then proceeds by proposing a new local change, which 
requires the re-computation of the determinant of Slater matrix D(R") in the subsequent 
configuration R". This progression of the simulation via local changes typically proceeds for 
many thousands to millions of steps until observables such as the total energy converge to 
a desired statistical accuracy. 

The slater matrix in configuration R = (ri, r2, ■ ■ ■ , r^v) is given by 



DfR) 



h{r2) 02(r2) 

^l(rAr) 02(riv) 
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(3) 



where rj and (pi for i = 1,2, ... ,N indicate respectively the spatial coordinates and spin- 
orbitals of i-th electron. Because we are moving a single electron (say p-th electron) at a 
time from position i-^ r^, the Slater matrix in the new electronic configuration R' = 
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r2, ■ ■ ■ , Tp, ■ ■ ■ , 



tn) is simply obtained by modifying the p-th row as 
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The Metropohs probabihty to accept or reject the move is dependent on the ratio of determi- 



and -D(R) is the determinant of D(R). The transition probabihty is in general proportional 
to l-Rp, assuming real wavef unctions. If the move is accepted, then the system configuration 
changes to R'; if not, the move is rejected and the system remains in configuration R. In 
the following, whenever the context is clear, we denote -D(R') by D' and -D(R) by D. 

For the Monte Carlo simulation to be efficient, all quantities related to the transition 
probability and any observables must be computed with minimum computational operations. 
In the case of a single determinant wavefunction the ratio D'/D is required. For a single 
electron move this corresponds to a change of a single row in the Slater matrix. However, for 
the case of the multideterminant wavefunction, as in Eq. [2], all ratios D^/D'^ and D^f / -^ab 
are required. These ratios involve determinants with both orbital replacements and single 
electron moves (i.e., both row and column changes) when compared to the original ground 
state determinant D. 

For a single electron move, the basic computational problem involved during the {k + 1)- 
th MC step may be expressed as: Given the determinanat of Slater matrix D^, compute 
the determinant Dk+i of D^+i such that 



where p{k) defines an index vector that maps k \—>- p such that p{k) = p, and denotes an 
unit vector with 1 on the p-th entry and everywhere else. The vector corresponds to the 
change in Slater matrix due to the displacement of the p-th electron during the {k + l)-th 
MC step and is given by 



nants of Slater matrices R 



, where -D(R') is the determinant of Slater matrix D(R') 



Dfc + ep(fc)V* 



(5) 




(6) 
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A straight-forward computation of Dk+i may be obtained as 



Dk+i 



;i+vtD,iep(,)) D, 



(7) 



and Rk 



D 



can be evaluated as 



Rk = (l + vtD,iep(fc)^ 



(8) 



Hence, for any given k, Rk can be evaluated efficiently in 0{N) computations since 
Dj^^ep(fc) can be interpreted as the p-th column of i.e., D^^(:,p) = D^^ep(fc). However, 
repetitive computation of Rk during each of the MC simulation steps (for k = 0, 1,2, . . .) 
requires an efficient procedure to compute D^^ for each k. For this purpose, traditional 
algorithms employ the Sherman-Morrison formula to update D 
expressed as 



D^^^, which can be 



D 



fc^tJp(fe)Vfc 



vlD: 



'i + vlD: 



-1, 



3p(fc)J 



(9) 



Using this formula, D^^ can be updated to D^^^^ in O^N"^) computations. However, since 
the required number of MC steps in a typical Monte Carlo simulation readily extends to the 
thousands to millions range, and can increase with increasing system sizes, (9(A^^) scaling of 
these traditional algorithms poses a significant hindrance for the simulation of large system 
sizes despite the fact that such large scale simulations are necessary to develop a better 
understanding of relevant chemistry and physics. As discussed in the introduction. Sec [H 
multiple determinants compound this problem. 

Alternatively, an efficient recursive algorithm for computing D^^^ may be formulated by 
expressing Eq. [9] as 



I- 



D 



k ep(fc)Vfc 



'l + vlD: 



ip{k)) 



(10) 



where = D^^ep(fc) and 7fc = Based on Eq. Uni a recursive scheme for computing 
Dj^^j^ may be formulated as 



(I - 7fcUfcV* ) ... (I - 7oUov* ) Do 1 



n(i 



(11) 



An 0{kN) recursive algorithm based on Eq. [TT] is presented in Algorithm [H For each 
additional step k, this algorithm requires storage space for two vectors and of size A^. 
In addition, we need to store an index vector p(/c) that maps /c ^— > p such that p{k) = p. 

III. SINGLE DETERMINANT BENCHMARKS 

In order to compare the computational efficiency of the recursive algorithm with the 
traditional algorithm, we first tested the case of a single determinant wavefunction. An 
analysis of the multideterminant case is given in Sec. IIV[ 

We tested the algorithms on a randomly generated matrix Dq. That is, since the algo- 
rithms are applicable for general matrices, we start with a matrix Dq whose elements are 
randomly chosen between zero and one. Then we consider rank-1 updates of Dq as given 
by Eq. [5] for m number of steps. The site locations p are chosen sequentially, modulo A^, 
for these m steps. The updated orbitals are chosen randomly. Figure [1] presents the ratio of 
the computational timings obtained using the full matrix updating and recursive updating 
algorithms. The timings were obtained using a standalone benchmark code using double 
precision arithmetic. We used the same data structures both in our recursive and full QMC 
simulations. Machine optimized linear algebra library calls were used for both algorithms. 
Timings were obtained on a 2.73 GHz Intel Xeon processor with 12 MB Cache. 

Examining the timings shown in Fig. [1], we see that the recursive update algorithm 
is always significantly faster than the Sherman-Morrison algorithm for a small number of 
updates. For up to ten updates, the new algorithm ~ 10 times faster for a 100 sized matrix, 
while for a 6400 sized matrix the new algorithm is ~ 1000 times faster. For increased numbers 
of updates the ratio of timings decreases. The crossover between the two algorithms occurs 
near the theoretically expected k = N updates. 

Due to the iterative nature of both algorithms numerical errors accumulate over time. It 
is common practise in QMC simulations to fully recalculate the inverse cofactor matrices 
from time to time to limit these errors. Such a recalculation requires 0{N^) operations. We 
have compared the numerical errors of the recursive update algorithm with the Sherman- 
Morrison algorithm and find the performance to be similar. Figure [2] illustrates the build 
up of errors for both algorithms for a single run. 

Figure [2] shows that both algorithms have good stability and on average give high accu- 
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racy, particularly for small numbers of updates. However, for both algorithms the average 
and maximum numerical error in the determinant ratio gradually increases with the number 
of updates and can become substantial. In both cases the maximum error for a fixed num- 
ber of updates can deviate by several orders of magnitude from the average. This behavior 
appears to be due to the occasional mixture of very small and very large numbers in the 
update formulae which results in a significant loss of precision. This data shows that while 
the recursive algorithm performs similarly to the Sherman-Morrison algorithm, it is vital to 
check sufficient accuracy is obtained if large numbers of updates are performed. 

IV. MULTIPLE DETERMINANT WAVEFUNCTIONS 

In the case of multiple determinant wavefunctions such as a configuration interaction 
expansion, all the excited Slater matrices D^(R) and DJj^(R) are similar to the ground 
state matrix D(R), and differ only by a few column interchanges. The use of the recursive 
algorithm provides an efficient way of calculating the transition probability compared to 
the traditional algorithm; It is not only faster but also requires reduced storage of (9(iV^) 
for storing only Dq ^ of the ground state matrix. No other potentially large data must be 
stored, although it is advantageous to reuse the current determinant values between MC 
steps. The recursive algorithm is used to compute the non-ground state determinants via 
column changes to the ground state matrix. The cost of each particle move is constant and 
does not increase when many steps are taken. 

For simplicity we analyse the case of a multiple determinant wavefunction consisting of 
only the ground state determinant and M determinants doubly excited from this state. 
Conventionally the Dq ^ as well as all the excited Slater matrix inverses are stored in mem- 
ory to enable fastest possible update using the traditional algorithm. When the recursive 
algorithm is applied to multiple determinant wavefunctions, we store only the Dq ^ of the 
ground state. Conventional updates are performed on this determinant and the recursive 
algorithm is used to compute the other excited determinants since the excited and ground 
state Slater matrices differ by a few column changes. Note that successive row updates 
can be performed in 0{kN) operations using an algorithm similar to that of Algorithm 
m However, successive row updates followed by multiple column updates always requires a 
O^N"^) cost associated with a matrix-vector multiplication. Since proposed moves are usu- 
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ally accepted in DMC calculations with an acceptance ratio of > 99%, it is convenient to use 
the conventional (Sherman-Morrison) algorithm to update the inverse of the ground state 
Slater matrix. It should also be noted that in the event the proposed move is accepted, the 
traditionally updated -D(R') used in the evaluation of the excited state determinants can be 
reused: the recursive update algorithm then requires no additional (9(A^^) work over a sin- 
gle determinant calculation. Consequently, using the recursive algorithm an M determinant 
wavefunction can be used with an updating cost scaling only linearly in M and system size 
compared to an A^^ scaling cost using the traditional algorithm. 

To evaluate determinant ratios such as -DJj^(R')/D^^(R) we first perform a traditional 
update to obtain D(R'). The recursive algorithm is then used to compute D^^(R') from 
D(R'). We assume that Djj^(R) is stored and available from a previous MC step, but this 
can also be calculated using two applications of the recursive algorithm to D(R). In Table 
[I we compare the costs of evaluating the determinant ratios in \E'(R')/\E'(R). Independent 
of the amount of storage chosen for the traditional scheme, the recursive scheme displays 
an improved computational cost by a factor 0{MN'^), or M times the cost of a complete 
single determinant update. The single determinant benchmarks of Sec. IIIII show that these 
updates, which are few in number and hence correspond to the left side of Fig. [H are several 
orders of magnitude faster than the traditional algorithm. 



V. CONCLUSIONS 



In this paper, we presented an efficient low-rank updating algorithm for QMC simula- 
tions. The algorithm requires only 0{kN) computations during A;-th MC step compared 
with 0{N^) computations required by traditional algorithms. Our numerical simulations 
indicate that for small numbers of updates of a single determinant this algorithm is orders 
of magnitude faster than traditional algorithms. For single determinant wavef unctions, the 
traditional algorithms remain the preferred choice for more than 0{N) updates. For mul- 
tideterminant wavefunctions of M + 1 determinants, our algorithm is the preferred choice, 
being significantly faster and of particular interest for large systems. In addition, it enables 
workspace to be reduced by a factor 0{MN'^). The speed and storage savings of this new 
algorithm enables QMC calculations to use thousands of determinants. 
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Algorithm 1 Recursive Algorithm (p-th electron moves) 

1: Given Dq ^ and 

2: Set p{k) =p 

3: Set Ufc = Dg ^ep(jfc) = 

4: for z = to A; — 1 do 

5: Compute = - 7^ (v^u^) 

6: end for 

7: Compute Rk = 1 + v|.Ufe 

8: if Accept then 

9: Compute 7fc = 

10: Save Ufc, v^, and 7jt 

11: k = k+1 

12: end if 
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FIG. 1: Relative timing of the recursive update algorithm to the traditional Sherman- Morrison 
algorithm for different matrix sizes. Ratios less than one indicate that the recursive algorithm is 
faster. 
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FIG. 2: Absolute numerical errors in computed determinant ratios using the recursive update (Rec) 
and Sherman-Morrison (SM) algorithms with double precision arithmetic for matrix sizes of 100 
(left) and 1000 (right). The behavior of the algorithms is similar. 
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Algorithm 



Move evaluation cost Move acceptance cost Storage cost 



Traditional 0{{1 + M)N) 

Minimum storage traditional 0((1 + 2M)N^ + (1 + M)N) 
Recursive 0{N^ + 3MN) 



0{{l + M)N^) 0{2{1 + M)N^) 



TABLE I: Cost of computing wavefunction ratios using traditional and recursive algorithms for 
proposed and accepted single electron moves. The wavefunction is of the configuration interaction 
doubles type consisting of M double excitations from a single ground state determinant. For the 
storage costs we consider only the most significant 0{N'^) and higher contributions. For at least 
the ground state determinant, both the full matrix and its inverse are stored resulting in the lead 
factor of 2 in the storage costs. In the traditional algorithm, the emphasis is on speed and hence 
all the excited state matrices and ground state matrix (along with its inverses) are stored. For the 
"minimum storage traditional", the emphasis is on limiting storage costs even at the expense of 
increased computational cost. Hence, in the "minimum storage traditional" algorithm, we assume 
that the traditional algorithm is used but only the ground state matrix is stored and the remaining 
matrices are computed based on the ground state matrix. 
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